Search CORE

16 research outputs found

metric-learn: Metric Learning Algorithms in Python

Author: Bellet Aurélien
Carey CJ
de Vazelhes William
Tang Yuan
Vauquier Nathalie
Publication venue: HAL CCSD
Publication date: 22/11/2019
Field of study

GitHub repository: https://github.com/scikit-learn-contrib/metric-learnmetric-learn is an open source Python package implementing supervised and weakly-supervised distance metric learning algorithms. As part of scikit-learn-contrib, it provides a unified interface compatible with scikit-learn which allows to easily perform cross-validation, model selection, and pipelining with other machine learning estimators. metric-learn is thoroughly tested and available on PyPi under the MIT licence

INRIA a CCSD electronic archive server

Pour plus de transparence dans l’analyse automatique des consultations ouvertes : leçons de la synthèse du Grand Débat National

Author: Bellet Aurélien
Denis Pascal
Gilleron Rémi
Keller Mikaela
Vauquier Nathalie
Publication venue: Société française de statistique
Publication date: 01/01/2021
Field of study

National audienceFaced with the limits of representative democracy, digital public consultations provide an opportunity for citizens to contribute their opinions and ideas and for policy makers to involve the population more closely in the public decision making process. The design and deployment of such public consultations pose well-known issues related to potential biases in the questions or in the representativeness of the participants. In this article, we consider the novel issues that arise from the use of artificial intelligence methods to automatically analyze contributions in natural language. Conducting such analyses constitutes a difficult problem for which many approaches (relying on various assumptions and models) exist. Considering the responses to the open-ended questions of the French "Grand Débat National" as a case study, we show that it is impossible to reproduce the results of the official analysis commissioned by the government. In addition, we identify a number of implicit and arbitrary choices in the official analysis that cast doubts on some of its results. We show also that different methods can lead to different conclusions. Our study highlights the need for greater transparency in the automatic analyses of public consultations so as to ensure reproducibility and public confidence in their results. We conclude with suggestions for improving digital public consultations and their analysis so that they encourage participation and become useful tools for public debate.Face aux limites de la démocratie représentative, les consultations numériques participatives publiques permettent de solliciter, à différents niveaux de pouvoir, des contributions de citoyens pour essayer de mieux impliquer les individus dans les décisions politiques. Leur conception et leur mise en place posent des problèmes bien connus tels que les biais dans les questions ou la représentativité de la population participante. Nous considérons, dans cet article, les problèmes nouveaux liés à l'utilisation de méthodes issues de l'intelligence artificielle pour l'analyse automatique de contributions en langage naturel. Réaliser une telle analyse est un problème difficile pour lequel il existe de nombreuses méthodes reposant sur des hypothèses et des modèles variés. En considérant comme cas d'étude les contributions aux questions ouvertes du Grand Débat National, nous montrons qu'il est impossible de reproduire les résultats de l'analyse officielle commandée par le gouvernement. En outre, nous identifions des choix arbitraires non explicités dans l'analyse officielle qui conduisent à émettre des doutes sur certains de ses résultats. Nous montrons également que différentes méthodes peuvent mener à des conclusions différentes. Notre étude met ainsi en lumière la nécessité d'une plus grande transparence dans les analyses automatiques de consultations ouvertes pour assurer leur reproductibilité et la confiance du public dans leur restitution. Nous concluons par des pistes d'amélioration des consultations participatives et de leur analyse pour qu'elles puissent inciter à la participation et être des outils utiles au débat public

INRIA a CCSD electronic archive server

metric-learn: Metric Learning Algorithms in Python

Author: Bellet Aurélien
Carey CJ
De Vazelhes William
Tang Yuan
Vauquier Nathalie
Publication venue: Microtome Publishing
Publication date: 01/01/2020
Field of study

International audiencemetric-learn is an open source Python package implementing supervised and weaklysupervised distance metric learning algorithms. As part of scikit-learn-contrib, it provides a unified interface compatible with scikit-learn which allows to easily perform cross-validation, model selection, and pipelining with other machine learning estimators. metric-learn is thoroughly tested and available on PyPi under the MIT license

INRIA a CCSD electronic archive server

Enhancing speech privacy with slicing

Author: Bellet Aurélien
Maouche Mohamed
Srivastava Brij Mohan Lal
Tommasi Marc
Vauquier Nathalie
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 18/09/2022
Field of study

International audiencePrivacy preservation calls for speech anonymization methods which hide the speaker's identity while minimizing the impact on downstream tasks such as automatic speech recognition (ASR) training or decoding. In the recent VoicePrivacy 2020 Challenge, several anonymization methods have been proposed to transform speech utterances in a way that preserves their verbal and prosodic contents while reducing the accuracy of a speaker verification system. In this paper, we propose to further increase the privacy achieved by such methods by segmenting the utterances into shorter slices. We show that our approach has two major impacts on privacy. First, it reduces the accuracy of speaker verification with respect to unsegmented utterances. Second, it also reduces the amount of personal information that can be extracted from the verbal content, in a way that cannot easily be reversed by an attacker. We also show that it is possible to train an ASR system from anonymized speech slices with negligible impact on the word error rate

INRIA a CCSD electronic archive server

A comparative study of speech anonymization metrics

Author: Bellet Aurélien
Maouche Mohamed
Srivastava Brij Mohan Lal
Tommasi Marc
Vauquier Nathalie
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 25/10/2020
Field of study

International audienceSpeech anonymization techniques have recently been proposed for preserving speakers' privacy. They aim at concealing speak-ers' identities while preserving the spoken content. In this study, we compare three metrics proposed in the literature to assess the level of privacy achieved. We exhibit through simulation the differences and blindspots of some metrics. In addition, we conduct experiments on real data and state-of-the-art anonymiza-tion techniques to study how they behave in a practical scenario. We show that the application-independent log-likelihood-ratio cost function C min llr provides a more robust evaluation of privacy than the equal error rate (EER), and that detection-based metrics provide different information from linkability metrics. Interestingly , the results on real data indicate that current anonymiza-tion design choices do not induce a regime where the differences between those metrics become apparent

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1